Keir Fraser [Wed, 23 Sep 2009 17:18:29 +0000 (18:18 +0100)]
Keir Fraser [Tue, 22 Sep 2009 13:19:38 +0000 (14:19 +0100)]
EPT: Assert p2m is locked in ept_sync_domain().
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 22 Sep 2009 13:18:51 +0000 (14:18 +0100)]
x86: Support more than 256 pins of ioapic.
Some large system may have many ioapics which
have more than 256 pins totally. To support this
case, just let pirq == irq and build 1:1 mapping
between them, and this is based on the assumpation
that pirq == GSI number in dom0 for iopaic IRQs.
Thank Jan Beulich from Novell for reporting the issue
in pv_ops dom0.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Tue, 22 Sep 2009 13:11:09 +0000 (14:11 +0100)]
x86: Fix the build.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 22 Sep 2009 08:18:25 +0000 (09:18 +0100)]
EPT: More efficient ept_sync_domain().
Rather than always flushing all CPUs, only flush CPUs this domain is
currently active on, and defer flushing other CPUs until this domain
is scheduled onto them (or the domain is destroyed).
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 22 Sep 2009 07:37:32 +0000 (08:37 +0100)]
mca: Fix several issues for MCA UCR error handling
This patch is for fixing several issues for MCA UCR error handling on
latest Intel platforms, including:
1) For UCR error, the is 0xC0 ~ 0xCF instead of just C0
2) Synchronization issues for clearing error finding flag and clearing
global MCIP flag. Otherwise, in some cases, MCIP flag can't be cleared.
Signed-off-by: Liping Ke <liping.ke@intel.com>
Keir Fraser [Tue, 22 Sep 2009 07:36:40 +0000 (08:36 +0100)]
tboot: fix tboot memory mapping for 32b
This patch used fixmap to get TXT heap base/size and SINIT base/size
from TXT pub config registers (whose address starts from 0xfed20000),
and get DMAR table copy from TXT heap (whose address may start from
0x7d520000) for tboot, instead of using map_pages_to_xen(), which will
cause panic on x86_32.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Keir Fraser [Tue, 22 Sep 2009 07:28:26 +0000 (08:28 +0100)]
x86: allow IRQs to work without APIC again
Non-IO-APIC IRQs must get 1:1 mapped between domain PIRQ and Xen IRQ.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:27:10 +0000 (08:27 +0100)]
Improve CSE in grant table code
The grant table code had some particularly frequent repetitions of
mfn_to_page() on each time the same input arguments. To help the
compiler (which can do only a limited job on CSE), this adds explicit
caching of the transformation result in a few places.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:26:16 +0000 (08:26 +0100)]
Introduce new flavour of map_domain_page()
Introduce a variant of map_domain_page() directly getting passed a
struct page_info * argument, based on the observation that in many
places the argument to this function so far simply was the result of
page_to_mfn(). This is meaningful for the x86-64 case where
map_domain_page() really just is an invocation of mfn_to_virt(), and
hence the combined mfn_to_virt(page_to_mfn()) now represents a
needless round trip conversion compressed -> uncompressed ->
compressed of the MFN representation.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:19:16 +0000 (08:19 +0100)]
x86: map M2P table sparsely
Avoid backing M2P table holes with memory, when those holes are large
enough to cover an exact multiple of large pages.
For the sake of saving and migrating guests, XENMEM_machphys_mfn_list
fills the holes in the array it returns with the MFN for the previous
range returned (thanks to Keir pointing out that it really doesn't
matter *what* MFN gets returned for invalid ranges). Using the most
recently encountered MFN (rather than e.g. always the first one)
represents an attempt to cut down on the number of references these
pages will get when they get mapped into a privileged domain's address
space.
This also allows for saving a couple of 2M pages even on certain
"normal" systems.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:18:19 +0000 (08:18 +0100)]
x86: map frame table sparsely
Avoid backing frame table holes with memory, when those holes are
large enough to cover an exact multiple of large pages. This is based
on the introduction of a bit map, where each bit represents one such
range, thus allowing mfn_valid() checks to easily filter out those
MFNs that now shouldn't be used to index the frame table.
This allows for saving a couple of 2M pages even on "normal" systems.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:16:49 +0000 (08:16 +0100)]
x86-64: reduce range spanned by 1:1 mapping and frame table indexes
Introduces a virtual space conserving transformation on the MFN thus
far used to index 1:1 mapping and frame table, removing the largest
range of contiguous bits (below the most significant one) which are
zero for all valid MFNs from the MFN representation, to be used to
index into those arrays, thereby cutting the virtual range these
tables must cover approximately by half with each bit removed.
Since this should account for hotpluggable memory (in order to not
requiring a re-write when that gets supported), the determination of
which bits are candidates for removal must not be based on the E820
information, but instead has to use the SRAT. That in turn requires a
change to the ordering of steps done during early boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:14:48 +0000 (08:14 +0100)]
x86-64: extend manageable memory range to 5Tb
Extend the virtual range reserved for the 1:1 mapping to cover 5Tb,
and make the virtual size of the frame table gets match whatever the
1:1 table can cover.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 22 Sep 2009 07:06:14 +0000 (08:06 +0100)]
vpmu_core2: support newer processors
Add code to get fully virtualized performance counters with newer
processors (which I'am able to test!) The most stuff is to check for
reserved bits in the control and counter register.
Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Keir Fraser [Tue, 22 Sep 2009 07:04:58 +0000 (08:04 +0100)]
x86 hvm: small cleanup in vpmu
Replace the special vpmu define LVTPC_HVM_PMU with the global
used define PMU_APIC_VECTOR to avoid different names for the
same thing.
Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Keir Fraser [Tue, 22 Sep 2009 07:02:50 +0000 (08:02 +0100)]
pv_ops: Build xen/master branch rather than xen-tip/master
Keir Fraser [Tue, 22 Sep 2009 07:02:01 +0000 (08:02 +0100)]
Keir Fraser [Tue, 22 Sep 2009 07:01:06 +0000 (08:01 +0100)]
x86: Fix memory leak in mce_wrmsr
Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
Keir Fraser [Tue, 22 Sep 2009 07:00:36 +0000 (08:00 +0100)]
x86 hvm: fix missing ticks bug of c/s 20218
With c/s 20218, timer ticks might be missed when IRQs of a timer are
queued. "Next scheduled time" is accumulated wrongly.
Thanks to Christoph for the report.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Reported-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Fri, 18 Sep 2009 13:45:40 +0000 (14:45 +0100)]
Revert 20221:
fc94d586d02f
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 18 Sep 2009 07:46:32 +0000 (08:46 +0100)]
iommu: Fix pirq conflict issue when guest adopts per-cpu vector.
Latest Linux and Windows may adopt per-cpu vector instead of global
vector, so same vector in different vcpu may correspond to different
interrupt sources. That is to say, vector and pirq should be 1:n
mapping, and the array msi_gvec_pirq can't meet the mapping
requirement, so need to improve the related logic, otherwise it may
introduce strange issues.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Keir Fraser [Fri, 18 Sep 2009 07:44:38 +0000 (08:44 +0100)]
xend: Implement VIF.get_network
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Fri, 18 Sep 2009 07:29:46 +0000 (08:29 +0100)]
AMD IOMMU: Extend the loop counter for polling completion wait bit.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Fri, 18 Sep 2009 07:29:19 +0000 (08:29 +0100)]
AMD IOMMU: Remove unused definitions.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Fri, 18 Sep 2009 07:28:52 +0000 (08:28 +0100)]
AMD IOMMU: If interrupt remapping is disabled, then do not update
interrupt remapping table with IOAPIC write.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Fri, 18 Sep 2009 07:28:20 +0000 (08:28 +0100)]
AMD IOMMU: Allow enabling iommu debug output at run time.
The old compile-time option is removed.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Fri, 18 Sep 2009 07:27:38 +0000 (08:27 +0100)]
xend: Unlink VDI instances and VBD instances
When VBD instances are destroyed by xm delete command, VDI
instances keep linking to the VBD instances unilaterally.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Fri, 18 Sep 2009 07:26:53 +0000 (08:26 +0100)]
Revert 20194:
582970a2d2dc
Excessively slows down domain creation in debug builds.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 16 Sep 2009 08:30:41 +0000 (09:30 +0100)]
x86 hvm: suspend platform timer emulation while its IRQ is masked
This patch gets rid of a timer which IRQ is masked from vcpu's timer
list. It reduces the overhead of VM EXIT and context switch of vm.
Also fixes a potential bug.
(1) VCPU#0: mask the IRQ of a timer. (ex. vioapic.redir[2].mask=1)
(2) VCPU#1: pt_timer_fn() is invoked by expiration of the timer.
(3) VCPU#1: pt_update_irq() is called but does nothing by
pt_irq_masked()==1.
(4) VCPU#1: sleep by halt.
(5) VCPU#0: unmask the IRQ of the timer.
After that, no one wakes up the VCPU#1.
IRQ of ISA is masked by:
- PIC's IMR
- IOAPIC's redir[0]
- IOAPIC's redir[N].mask
- LAPIC's LVT0
- LAPIC enabled/disabled
IRQ of LAPIC timer is masked by:
- LAPIC's LVTT
- LAPIC disabled
When above stuffs are changed, the corresponding vcpu is kicked and
suspended timer emulation is resumed.
In addition, a small bug fix in pt_adjust_global_vcpu_target().
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Keir Fraser [Wed, 16 Sep 2009 08:29:17 +0000 (09:29 +0100)]
x86 hvm: don't set periodical timer again until its IRQ is delivered.
Modern Windows OS (ex XP,2003,2008) never use the PIT timer,
and neither cpu#0's LAPIC timer after boot.
Despite that, xen emulates them busily. It's inefficient.
With this patch, setting a timer is defered while its IRQ is masked.
The reasons why pt_timer_fn() simply calls vcpu_kick() are:
- checking by pt_irq_masked() is duplicated. pt_update_irq() also
does.
- pt_timer_fn() is likely called on the same processor
as pt->vcpu->processor. Hence vcpu_kick() hardly send IPI.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Keir Fraser [Wed, 16 Sep 2009 08:26:04 +0000 (09:26 +0100)]
Remove "buffer half full" check from guest_console_write
Checks are made at a lower level in the serial code, and teh policy
there is to drop rather than wait. So boot makes progress even when
serial hardware is problematic.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Keir Fraser [Wed, 16 Sep 2009 08:22:38 +0000 (09:22 +0100)]
xend: Consider ioemu devices for inactive managed domains
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Wed, 16 Sep 2009 08:21:56 +0000 (09:21 +0100)]
AMD IOMMU: Rework of interrupt remapping
1) Parsing IVRS special device entry in order to handle ioapic
remapping correctly.
2) Allocating per-device interrupt remapping tables instead of using a
global interrupt remapping table.
3) Some system devices like io-apic for north-bridge cannot be
discovered during pci device enumeration procedure. To remap interrupt
of those devices, device table update is split into 2 steps, so
that interrupt tables can be bound to device table entry earlier than
I/O page tables.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Wed, 16 Sep 2009 08:16:38 +0000 (09:16 +0100)]
x86: irq ratelimit
This patch adds the feature of irq ratelimit. It temporarily masks
the interrupt (guest) if too many irqs are observed in a short
period (irq storm), to ensure responsiveness of Xen and other guests.
As for now, the threshold can be adjusted at boot time using command-
line option irq_ratelimit=xxx.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 16 Sep 2009 07:55:23 +0000 (08:55 +0100)]
x86 hvm: Guests should scan CPUID range
40000000-
4000ff00 for Xen leaves.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 15 Sep 2009 09:08:12 +0000 (10:08 +0100)]
xenoprof: force use of architectural perfmon instead of the CPU
specific event set, which may be not supported by oprofile user space
tool yet.
Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Keir Fraser [Tue, 15 Sep 2009 09:03:16 +0000 (10:03 +0100)]
xenoprof: support Intel's architectural perfmon registers.
One benefit is that more perfmon counters can be used on Nehalem.
Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Keir Fraser [Tue, 15 Sep 2009 09:02:15 +0000 (10:02 +0100)]
xenoprof: add support for Core i7 and Atom.
Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Keir Fraser [Tue, 15 Sep 2009 08:54:16 +0000 (09:54 +0100)]
x86: Free unused pages of per-cpu data.
As well as freeing data pages for impossible cpus, we also free pages
of all other cpus which contain no actual data (because of too-large
statically-defined PERCPU_SHIFT).
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:52:26 +0000 (09:52 +0100)]
x86: Re-increase size of percpu area
Per-cpu vector code add a lot of percpu data. Together with perfc
enabled, one page per cpu is not enough any more.
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
Keir Fraser [Tue, 15 Sep 2009 08:46:08 +0000 (09:46 +0100)]
p2m: Fix debug build.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:26:52 +0000 (09:26 +0100)]
Keir Fraser [Tue, 15 Sep 2009 08:26:08 +0000 (09:26 +0100)]
xend: Fix VDI.get_record
We cannot get correct records of VDI by VDI.get_record.
The correct records of VDI are gotten by this patch.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Tue, 15 Sep 2009 08:25:41 +0000 (09:25 +0100)]
x86 mce: Fix panic in mcheck_mca_logout
I met the following panic message in mcheck_mca_logout().
MSR_IA32_MCi_ADDR might take the values other than the machine
address. FATAL PAGE FAULT occured when the non-existent address is
passed to maddr_get_owner().
Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
Keir Fraser [Tue, 15 Sep 2009 08:24:59 +0000 (09:24 +0100)]
Vt-d: queued invalidation cleanup
This patch cleans up queued invalidation, including round wrap
check, multiple polling status and other minor changes. This version
uses local variable as the polling address, which is clean.
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Keir Fraser [Tue, 15 Sep 2009 08:23:44 +0000 (09:23 +0100)]
x86: Remove PSE flag from PV guest CR4 and CPUID.
From: Dave McCracken <dcm@mccr.org>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:21:34 +0000 (09:21 +0100)]
pygrub: Correct pygrub return value
This is the patch to correct pygrub return value for checkPassword()
function. It didn't return False at the end of the function. It
returned None so it was working fine and it's most likely just a
cosmetic issue.
Also, the missing () were added to checkPassword() function when
calling hasPassword and the unnecessary comment was removed.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Tue, 15 Sep 2009 08:20:47 +0000 (09:20 +0100)]
xend: Receive error message of migration from destination server
The following error message was shown by xm migrate command.
In fact, I caused the command error by intention. I prepared a
destination server where free memory was insufficient, and then
I tried to migrate a VM to the destination server. As I had
expected, the command error occurred. However the error message
was different from my expectation. I would like to show an error
message from the destination server if an error occurred on the
destination server.
# xm migrate --live vm3 bx339
Error: (107, 'Transport endpoint is not connected')
Usage: xm migrate <Domain> <Host>
Migrate a domain to another machine.
Options:
-h, --help Print this help.
-l, --live Use live migration.
-p=3Dportnum, --port=3Dportnum
Use specified port for migration.
-n=3Dnodenum, --node=3Dnodenum
Use specified NUMA node on target.
-s, --ssl Use ssl connection for migration.
If a destination server sends an error message, this patch shows=20
the error message. For example, the following error message is=20
shown if free memory of the destination server is insufficient.
# xm migrate --live vm3 bx339
Error: I need 262144 KiB, but dom0_min_mem is 716800 and shrinking
to=20
716800 KiB would leave only 50368 KiB free. (from bx339)
Usage: xm migrate <Domain> <Host>
Migrate a domain to another machine.
Options:
-h, --help Print this help.
-l, --live Use live migration.
-p=3Dportnum, --port=3Dportnum
Use specified port for migration.
-n=3Dnodenum, --node=3Dnodenum
Use specified NUMA node on target.
-s, --ssl Use ssl connection for migration.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Tue, 15 Sep 2009 08:19:23 +0000 (09:19 +0100)]
Replace magic number for NULL (~0) with PAGE_LIST_NULL
...in the page_list_* functions.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:16:52 +0000 (09:16 +0100)]
blktap2: Fix off-by-one error in driver lookup
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:16:19 +0000 (09:16 +0100)]
PoD: Implement PoD for EPT
This patch implements the populate-on-demand functionality for EPT.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:15:14 +0000 (09:15 +0100)]
p2m: Reorganize p2m_pod_demand_populate in preparation for EPT PoD patch
p2m_pod_demand_populate is too non-EPT-p2m-centric. Reorganize code
to have a p2m-specific call that wraps a generic PoD call.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:14:36 +0000 (09:14 +0100)]
EPT: Clean up some code
Clean up and reorganize some code in preparation for adding
populate-on-demand functionality.
Should be no functional difference.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:13:38 +0000 (09:13 +0100)]
PoD: Check p2m assumption in debug builds
The PoD code assumes that if:
* A page is in a domain's p2m table
* And it's owned by the domain
* And it's not a xenheap page
then:
* It's on the domain's page list.
This patch adds a check for this assumption when debug=y.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:13:01 +0000 (09:13 +0100)]
PoD: Fix debug build.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:09:18 +0000 (09:09 +0100)]
PoD: Don't reclaim xenheap pages in zero-sweep
Don't reclaim xenheap-allocated pages in the zero-sweep. This avoids
grabbing things like grant tables mapped in the p2m.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:08:36 +0000 (09:08 +0100)]
PoD: Scrub pages before adding to the cache
Neither memory from the allocator nor memory from
the balloon driver is guaranteed to be zero. Scrub it
before adding to the cache.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 15 Sep 2009 08:06:46 +0000 (09:06 +0100)]
passthrough: remove pointless error checks
map_domain_page() cannot return NULL. And if it could, both instances
changed here would leak memory in such a case.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 9 Sep 2009 15:39:41 +0000 (16:39 +0100)]
x86: add an extra check when validating a huge pv L2 entry
While get_page_and_type_from_pagenr() (through get_page_from_pagenr())
does the needed mfn_valid() check, get_data_page() doesn't and, it
being passed a struct page_info pointer, really expects it's caller(s)
to do.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 9 Sep 2009 15:32:25 +0000 (16:32 +0100)]
Fix an obviously inverted check in offline_page()
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 9 Sep 2009 14:34:37 +0000 (15:34 +0100)]
Fix typo in c/s 20158:
f9ce5858.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 9 Sep 2009 14:33:30 +0000 (15:33 +0100)]
xm,xend: Make cpus parameter available
When I started a VM by using xm create command, cpus parameter in VM
configuration files was ignored. The problem occurred only when I
used XenAPI. This patch makes the parameter available.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Wed, 9 Sep 2009 14:32:30 +0000 (15:32 +0100)]
mount /proc/xen in init.d/xen
pvops dom0 kernels have a separate xenfs which has to be mounted on
/proc/xen. Systems with older configurations don't have xenfs listed
in fstab, and it can sometimes make sense to keep it that way (for
example, if the dom0 wants to boot a native-only kernel too).
The attached patch to the script which ends up in /etc/init.t/xend
mounts /proc/xen if it appears to be necessary.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Tue, 8 Sep 2009 14:11:52 +0000 (15:11 +0100)]
x86: Fix typo in p2m_pod_set_cache_target
Fix typo in p2m_pod_set_cache_target by defining (1<<9) as
SUPERPAGE_PAGES
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 8 Sep 2009 14:11:18 +0000 (15:11 +0100)]
Keir Fraser [Tue, 8 Sep 2009 14:10:59 +0000 (15:10 +0100)]
xend: Fix syntax error
Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Tue, 8 Sep 2009 14:10:31 +0000 (15:10 +0100)]
VT-d: prevent dom0 to use VT-d HW
pv-ops dom0 contains Linux upstream VT-d driver, and will go to enable
it when VT-d is set in kernel config file. It should not enable VT-d
in dom0.
Currently it already zaps ACPI DMAR signature to prevents dom0 using
VT-d HW when VT-d is enabled for Xen. But when VT-d is not enabled for
Xen, and VT-d is set in pv-ops kernel config file, pv-ops dom0 will go
to enable it. This will results in pv-ops dom0 booting failure. This
patch prevents dom0 to use VT-d HW whether VT-d is enabled or disabled
for Xen.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Mon, 7 Sep 2009 13:26:06 +0000 (14:26 +0100)]
Fix etags invocation
Don't fail in the case where 'etags' isn't Exuberant Ctags
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Mon, 7 Sep 2009 12:52:48 +0000 (13:52 +0100)]
xend: passthrough: add an option pci-passthrough-strict-check
Currently when assigning device to HVM guest, we use the strict check
for HVM guest by default.(For PV guest we use loose check
automatically if necessary.)
When we assign device to HVM guest, if we meet with the co-assignment
issues or the ACS issue (see changeset 20081:
4a517458406f), we could
try changing the option to 'no' -- however, we have to realize this
may incur security issue and we can't make sure the device assignment
could really work properly even after we do this.
The option is located in /etc/xen/xend-config.sxp:
(pci-passthrough-strict-check yes)
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 12:52:17 +0000 (13:52 +0100)]
vt-d: don't treat IOAPIC RTE of dest_SMI type specially.
We also need to create IRTE for it since we enable EIM and clear CFI,
or else, the IOAPIC RTE's interrupt message would be blocked by IR unit.
In io_apic_read_remap_rte(), we now use
"apic_pin_2_ir_idx[apic][ioapic_pin]"
rather than "(remap_rte->index_15 << 15) | remap_rte->index_0_14" to
avoid the "interrupt remapping table out of bound error".
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 12:51:55 +0000 (13:51 +0100)]
vt-d: some small fixes to apic_pin_2_ir_idx
1) apic_pin_2_ir_idx should be int** rahter than unsigned int**,
because we use the int -1 to indicate that the related IRTE index is
not allocated.
2) shouldn't re-init apic_pin_2_ir_idx when resuming from S3.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 12:51:37 +0000 (13:51 +0100)]
x86: Some cleanups for apic_write, apic_read, apic_wrmsr, apic_rdmsr
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 12:51:19 +0000 (13:51 +0100)]
vt-d: replace the gdprintk with dprintk since it isn't in guest context.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 12:50:55 +0000 (13:50 +0100)]
pygrub: trap exception when python module import fails
Fix the issue when importing 'crypt' module or crypt.crypt fails in
pygrub. The exception is written on the same line like "Failed!"
message but only if there is an exception. If there is no exception,
we don't bother users with details (probably the password they entered
was wrong) so we just display "Failed!" message. Also, the code for
hasPassword() was rewritten not to have try/except block here.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Mon, 7 Sep 2009 12:49:35 +0000 (13:49 +0100)]
vt-d: avoid obtaining iommu->register_lock too early in
dma_msi_set_affinity()
If set_desc_affinity() fails, the current code doesn't release the
spinlock. We should obtain the lock at a later place.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 08:00:21 +0000 (09:00 +0100)]
xend: Enable to set config variables in /etc/sysconfig/xend
The attached patch enables to set the environment variables for xend
in /etc/sysconfig/xend.
There are four variables.
XENCONSOLED_TRACE=3D[none|guest|hv|all]
XENSTORED_ROOTDIR=3D/var/lib/xenstored
XENSTORED_TRACE=3D[yes|on|1]
XENBACKENDD_DEBUG=3D[yes|on|1]
The XENCONSOLED_TRACE and XENSTORED_ROOTDIR take strings for each
command's options. And if thease variables have non-zero strings, then
export them.
If the XENSTORED_TRACE and XENBACKENDD_DEBUG take either "yes", "on"
or "1" then export them.
From: Kazuhiro SUZUKI <kaz@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 7 Sep 2009 07:48:12 +0000 (08:48 +0100)]
xend: Revert c/s 17536 which breaks PV passthru of MSI-X devices.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 7 Sep 2009 07:46:46 +0000 (08:46 +0100)]
Add the support of x2apic logical cluster mode.
Add a xen boolean parameter 'x2apic'.
Add a xen boolean parameter 'x2apic_phys'(by default, we use logical
cluster mode).
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 07:46:03 +0000 (08:46 +0100)]
vt-d: use 32-bit Destination ID when Interrupt Remapping with EIM is
enabled
When x2APIC and Interrupt Remapping(IR) with EIM are enabled, we
should use 32-bit Destination ID for IOAPIC and MSI.
We implemented the IR support in xen by hooking the functions like
io_apic_write(),io_apic_modify(), write_msi_message(), and as a
result, in the hook functions in intremap.c, we can only see the 8-bit
dest id rather the 32-bit id, so we can't set IR table Entry that
requires a 32-bit dest id.
To solve the issue throughly, we need find every place in io_apic.c
and msi.c that could write ioapic RTE and and device's msi message and
explicitly handle the 32-bit dest id carefully (namely, when genapic
is x2apic, cpu_mask_to_apic could return a 32-bit value); and we have
to change the iommu_ops->{.update_ire_from_apic, .update_ire_from_msi}
interfaces. We may have to write an over-1000-LOC patch for this.
Instead, we could use a workround:
1) for ioapic, in the struct IO_APIC_route_entry, we could use a new
"dest32" to refer to the dest field;
2) for msi, in the struct msi_msg, we could add a new "u32 dest".
And in intremap.c, if x2apic_enabled, we use the new names to refer to
the dest fields.
We can improve this in future.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 07:44:50 +0000 (08:44 +0100)]
vt-d: enhance the support of Interrupt Remapping EIM and x2APIC
1) Clear Interrupt Remapping(IR) unit's CFI (Compatibility Format
Interrupt) to enhance security;
2) Move the iommu_setup() ahead and put it before we begin to use
IOAPIC so we can make sure after we enable Interrupt Remapping, the
later IOAPIC (and MSI) initialization would setup IOAPIC RTEs (and
MSI) with remappable format;
3) Enable x2APIC only when all VT-d engines support IR with EIM
(Extended Interrupt Mode). EIM enables external devices to deliver
interrupts to logical processor with >8-bit APIC ID.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 7 Sep 2009 07:44:00 +0000 (08:44 +0100)]
x86/mmcfg: misc adjustments
- fix the mapping range (end_bus_number is inclusive)
- fix the mapping base address (shifting segment by 22 was set for
overlapping mappings; assuming the goal was to reduce the virtual
space used when less than 256 busses are present on all segments,
adding logic to determine the smallest possible shift value)
- fix PCI_MCFG_VIRT_END, and actually use it to avoid creating
- mappings
outside the designated range
- fix address calculations (segment numbers must be converted to long
to avoid truncation)
- add a way (command line option) to suppress the use of mmconfig as
well as to actually use the AMD Fam10 special code
- correct __init annotations
- use xmalloc()/xmalloc_array() in favor of xmalloc_bytes()
- eliminate dead code and data
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Mon, 7 Sep 2009 07:43:14 +0000 (08:43 +0100)]
amd iommu: Remove a useless flag and fix I/O page fault for hvm
passthru devices.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Mon, 7 Sep 2009 07:42:50 +0000 (08:42 +0100)]
amd iommu: Cleanup initialization functions and fix a fatal page fault
caused by out-of-bounds access to irq_to_iommu array.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Mon, 7 Sep 2009 07:41:45 +0000 (08:41 +0100)]
x86-64/mmcfg: add explicit support for nVidia MCP55
This is a simple port from Linux 2.6.31-rc8.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Mon, 7 Sep 2009 07:41:00 +0000 (08:41 +0100)]
x86: convert frame_table to a #define
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Mon, 7 Sep 2009 07:40:33 +0000 (08:40 +0100)]
Tidy evtchn keyhandler a little
Get rid of all the -1s and label the pending and masked columns.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Mon, 7 Sep 2009 07:38:39 +0000 (08:38 +0100)]
xend: passthrough: fix physdev_map_pirq invocation
For those devices not having INTx (like VFs), avoid calling map_pirq,
otherwise the guest cannot be started successfully.
Also avoid calling this hypercall for hvm guest, this is done in the
device model.
Signed-off-by: Qing He <qing.he@intel.com>
Keir Fraser [Mon, 7 Sep 2009 07:37:58 +0000 (08:37 +0100)]
Fix some issues for HVM log dirty:
* Add necessary logging dirty in qemu to avoid guest error with
intensive disk access when live migration
* Take place of shared memory between qemu and migration tools by new
added hypercall, which is clean and simple
Signed-Off-By: Zhai, Edwin <edwin.zhai@intel.com>
Keir Fraser [Fri, 4 Sep 2009 07:43:05 +0000 (08:43 +0100)]
x86: Fix PoD cache size when decreasing memory
Certain paths through p2m_pod_decrease_reservation() fail to reduce
the size of the PoD cache if the number of outstanding entries is less
than the size of the cache. Rearrange so this doesn't happen.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Fri, 4 Sep 2009 07:42:10 +0000 (08:42 +0100)]
xend: Support "bootloader" mode for "drbd:" devices
To be able to use "bootloader" on drbd devices the following changes
need to be made:
*) Translation of devicename
_parse_uname which is used by blkdev_uname_to_file which is again used
by _configureBootloader in XendDomainInfo needs to be able to resolve
drbd resources to the corresponding blockdevice to feed to the
configured bootloader.
*) Activation of drbd device
If the drbd device isn't in Primary mode when the bootloader tries to
fetch the kernel and initrd, the start of the DomU will fail. To
prevent this the given drbd device will be made Primary before the
bootloader gets executed.
A note on the naming of drbd resouces: drbd uses mostly resource names
in it's userland tools. Because of that drbd VBDs, if configured with
the "drbd:" type, should always use the drbd resource name as
suggested by the drbd documentation at
http://www.drbd.org/users-guide-emb/s-xen-configure-domu.html. My
patches assume that the VBDs are named accordingly.
Signed-off-by: Michael Renner <michael.renner@geizhals.at>
Keir Fraser [Fri, 4 Sep 2009 07:34:45 +0000 (08:34 +0100)]
xend: fix domain_migrate
When the guest(pv-on-hvm guest that cannot suspend) reboot in
LiveMigration, the disconnecting of src-side is not transmitted to
dist-side. As a result, the error processing on the dist side is not
executed.
Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 3 Sep 2009 08:51:37 +0000 (09:51 +0100)]
vt-d: fix Dom0 S3 resume.
When resuming from Dom0 S3, here 'irq' is -1, so we can't use it at
all. We should always use iommu->irq.
With the patch applied on the current tip 20153 and using the 2.6.18
Dom0, Dom0 S3 works fine (at least on my DQ35).
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Thu, 3 Sep 2009 08:50:46 +0000 (09:50 +0100)]
x86 vpt: Small performance fixes.
1. once one-shot timer is fired, IRQ is raised repeatedly forever.
2. Test pending_intr_nr before pt_irq_masked(), as it is cheaper.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Keir Fraser [Thu, 3 Sep 2009 08:49:41 +0000 (09:49 +0100)]
xm: Add "tap2" to attach blocktap disks to VM
I detected a problem when using XenAPI. When I started a VM by
using xm create command, blocktap disks were not attached to the
VM.
Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
Keir Fraser [Thu, 3 Sep 2009 06:37:27 +0000 (07:37 +0100)]
x86: com devices's irqaction shouldn't free.
Since irqs of serial devices are initialized in early Xen and
its irqaction is not allocated from heap, so doesn't need free
in release irq logic.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 3 Sep 2009 06:29:29 +0000 (07:29 +0100)]
[IOMMU] dynamic VTd page table for HVM guest
This patch makes HVM's VTd page table dynamic just like what PV guest
does, so that avoid the overhead of maintaining page table until a PCI
device is truly assigned to the HVM guest.
Signed-Off-By: Zhai, Edwin <edwin.zhai@intel.com>
Keir Fraser [Wed, 2 Sep 2009 15:15:05 +0000 (16:15 +0100)]
libxenguest: Remove unused static inline function is_loadable_phdr()
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Wed, 2 Sep 2009 15:12:41 +0000 (16:12 +0100)]
Enable some SCSI drivers in pvops kernel config
Enables a couple of SCSI host controllers which are found in our test
farm but not enabled in the default upstream kernel. The new drivers
are compiled as modules which is pretty harmless so this should be
safe.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Wed, 2 Sep 2009 10:40:04 +0000 (11:40 +0100)]
x86: Remove the redundant logic in set_msi_affinity
Remove the redundant logic in set_msi_affinity. And it is introduced
accidently, maybe something wrong when I generated the patch.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>